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Differentiation is a key cellular process in normal tissue development that is significantly altered in cancer. 
Although molecular signatures characterising pluripotency and multipotency exist, there is, as yet, no single 
quantitative mark of a cellular sample's position in the global differentiation hierarchy. Here we adopt a 
systems view and consider the sample's network entropy, a measure of signaling pathway promiscuity, 
computable from a sample's genome-wide expression profile. We demonstrate that network entropy 
provides a quantitative, in-silico, readout of the average undifferentiated state of the profiled cells, 
recapitulating the known hierarchy of pluripotent, multipotent and differentiated cell types. Network 
entropy further exhibits dynamic changes in time course differentiation data, and in line with a sample's 
differentiation stage. In disease, network entropy predicts a higher level of cellular plasticity in cancer stem 
cell populations compared to ordinary cancer cells. Importantly, network entropy also allows identification 
of key differentiation pathways. Our results are consistent with the view that pluripotency is a statistical 
property defined at the cellular population level, correlating with intra-sample heterogeneity, and driven by 
the degree of signaling promiscuity in cells. In summary, network entropy provides a quantitative measure 
of a cell's undifferentiated state, defining its elevation in Waddington's landscape. 



The observed diversity of mature cells and human tissues arises as a result of a complex, intricate program of 
cellular differentiation, ultimately originating from (pluripotent) embryonic stem cells 1 . Although systems 
biology principles underpinning the transitions between specific cellular states, such as pluripotency and 
progenitor states, are in the process of being elucidated 2 " 5 , much remains to be learned. In the case of hematopoi- 
esis, one of the best understood developmental systems, the full repertoire of transcription factors and signaling 
pathways dictating cell-fate is still unknown 5 " 9 . Other studies have focused on characterising the pluripotent and 
progenitor states in terms of genome-wide gene expression 10 " 15 , DNA methylation and chromatin state pro- 
files 16 " 20 . Although these molecular signatures can discriminate cells of specific differentiation stages from each 
other, there is, as yet, no single quantitative measure that can correctly place a sample within the global differ- 
entiation hierarchy. Rephrased in the context of Waddington's differentiation landscape 21 , we do not yet have a 
molecular measure that can represent the energy potential, i.e. the elevation, in Waddington's landscape. 

Recently, it has been proposed that pluripotency, and more generally, the undifferentiated state, is an emergent 
statistical property of a population of cells 22 " 24 , not well-defined at the single-cell level. Specifically, it has been 
argued that high cellular diversity underpins the pluripotent or multipotent capacity of stem cell populations, 
with differentiated cell populations representing a more uniform synchronised state 22 . Motivated by this, we here 
explore a system's property of a cellular sample, called network entropy, in the context of cellular differentiation. 
At the single-cell level network entropy can be thought of as an approximate measure of signaling pathway 
promiscuity 22,25 " 27 . Thus, a highly undifferentiated cell, such as a pluripotent stem cell, would have a high network 
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entropy since it must maintain the option to initiate the activation of 
a wide number of different signaling pathways associated with com- 
mitment to diverse cell fates 6 . In contrast, a terminally differentiated 
cell would have a low network entropy, since it must maintain activa- 
tion of a few pathways specific to their fate. At the population level, 
high network entropy would thus imply increased cellular heterogen- 
eity, since the increased signaling promiscuity results in an increased 
stochasticity across single cells. Thus, we posited that network 
entropy would provide a direct molecular correlate of the undiffer- 
entiated state of a cellular sample, allowing us to place an arbitrary 
sample at its appropriate elevation in Waddington's landscape. 

To test our hypothesis, we here compute sample-specific network 
entropies for a large number of gene expression data sets relevant to 
cellular differentiation, reprogramming and cancer, encompassing 
over 800 samples, including cell-lines and primary tissue. Our main 
key findings are: (i) network entropy is a highly accurate discrim- 
inator of pluripotent and non-pluripotent cell-types, (ii) it can fur- 
ther discriminate cellular states of varying degrees of multipotency 
within distinct lineages, (iii) it provides a more robust and general 
measure of a cell's position in the global differentiation hierarchy 
than gene expression signatures, and does so independently of cell 
proliferation, and (iv) it predicts a higher cellular heterogeneity in 
cancer stem cells compared to ordinary cancer cells. 

Results 

Construction and rationale of network entropy. To compute net- 
work entropy requires estimation of the signaling/interaction 
probabilities of proteins in a given sample. Thus, we integrated the 
gene expression profile of a given sample with a comprehensive 
protein interaction signaling network (PIN) (see SI 28 ), using the 
mass-action principle to construct a sample-specific stochastic 



matrix py where i and j label two distinct genes. The stochastic 
matrix provides a rough proxy for the interaction probabilities 
present in the given sample and its construction is based on the 
assumption that two genes known to interact at the protein level 
will have a greater interaction probability when they are both 
highly expressed (see SI). From the stochastic matrix, the network 
entropy can be calculated as the entropy rate 29,30 

S R =^2niSi (1) 

i 

where S { is the local entropy of node (gene/protein) i and where % { is 
the z'th element of the stationary distribution of pjj (i.e. np = n, see 
Methods, SI). Thus, the entropy rate gives a steady state average 
measure of the uncertainty (or promiscuity) in signaling informa- 
tion flow over the network. To facilitate comparison of entropy rates 
obtained from samples profiled on different expression arrays, values 
were always normalised to the maximum possible entropy rate of a 
given integrated network (SI, fig. SI). 

We posited that the entropy rate of a sample (e.g. a cell-line), as 
computed above, would capture the average level of signaling path- 
way promiscuity and hence of the cellular heterogeneity in the sam- 
ple. Under this model, highly undifferentiated and plastic cells, such 
as stem cells, would be characterised by a state of high network 
entropy, allowing them the option to differentiate into diverse cell 
lineages (Fig. 1A). Similarly, since differentiation implicates activa- 
tion of specific molecular signaling pathways, this activation would 
lead to a reduction in the uncertainty/promiscuity of information 
flow, i.e. a low entropy state (Fig. 1A). 

As a proof of concept that the entropy rate does indeed measure 
the level of signaling promiscuity we first devised a simulation model 
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Figure 1 | Network entropy as the energy potential in Waddington's landscape. (A) Illustration of network entropy's role in cellular differentiation. The 
z-axis represents the network entropy rate (SR) of a cell, which is a measure of the promiscuity/redundancy in the signaling patterns within the cell. The 
two-dimensional plane spanned by the x-and-y axis represents gene expression state/phase space. We model a cell in a pluripotent stem-cell like state as 
being in a corresponding shallow attractor in phase space, characterised by increased signaling promiscuity (high network entropy), thus allowing each 
cell in the population to explore more freely the underlying phase space, resulting in a high cellular diversity. In contrast, a terminally differentiated cell is 
defined by activation of specific signaling pathway(s), corresponding to less uncertainty in how signals flow in the network (a state of low entropy). 
Cells in this state are in deep attractors and cellular diversity at the population level is low. (B) Simulation of pathway activation in a realistic protein 
interaction network (only a small subnetwork is shown). In the left, edge weights are defined equally, so that the random walk on the network is unbiased. 
On the right, a specific pathway is activated by increasing the relative weights of edges connecting the genes in the pathway (shown in dark red). Lower 
panel compares the entropy rate (SR) of the unbiased state, representing a highly promiscuous poised cellular state (magenta diamond), to the entropy 
rates obtained by separately activating each individual gene in the network ( > 1000 perturbations, "Commt(Pert.)")> and to the entropy rates obtained by 
activating whole signal transduction pathways (100 pathways, "Commt.(Path)"). Binomial test P-values are given. 
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(SI). We compared the entropy rate of our PIN with weights defined 
by a uniform stochastic matrix (i.e. one withpy oc l/fc f where k t is the 
degree of node i) representing a promiscuous poised state, to the 
entropy rate obtained by randomly activating individual genes and 
specific signal transduction pathways in the network (SI, Fig. IB). In 
the case where individual genes were activated, this led, in approxi- 
mately 70% of perturbations, to a reduction in the global entropy rate 
(Binomial P < 0.001, Fig. IB). However, in the case where whole 
signaling pathways were activated, the reduction in the entropy rate 
was observed in 85% of cases (Binomial P < 10" 10 , Fig. IB), consist- 
ent with a substantially lower uncertainty in the information flow. 

Network entropy quantifies the level of multipotency. Based on the 
simulation results, we sought to determine if network entropy could 
discriminate biological samples that differ in terms of their signaling 
promiscuity. Thus, we computed the network entropy rate of 
samples in the "stem cell matrix" (SCM), a compendium of over 
219 samples (mostly cell-lines), all profiled with the same Illumina 
arrays, 59 of which were deemed pluripotent, with the rest (160) 
deemed non-pluripotent 11 . We observed that network entropy was 
significantly higher in the cell-lines deemed pluripotent (P < 10" 10 , 
Fig. 2A). To provide an independent benchmark we also computed a 
t-test based pluripotency score (TPSC, SI), constructed from an 
independent 19-gene pluripotency expression signature, contain- 
ing important pluripotency markers such as NANOG and 
LIN28A 12 . The TPSC pluripotency score was also significantly 
higher in the pluripotent cell lines (SI, fig. S2), and both measures 
were significantly correlated, confirming that network entropy is 
indeed a marker of pluripotency (Fig. 2B). In an independently 
generated data set profiling 107 human embryonic and 52 induced 
pluripotent stem cell lines, as well as 32 differentiated tissue 
samples 31 , the entropy rate achieved 100% accuracy in discriminat- 
ing pluripotent from differentiated samples (Figs. 2C-D). Crucially, 
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Figure 2 | Network entropy correlates with pluripotency. (A) Normalised 
entropy rates (SR/maxSR, y-axis) between the 59 pluripotent and 160 non- 
pluripotent cell-lines from the SCM compendium (219 samples). P-value 
is from a Wilcoxon rank sum test. (B) Scatterplot of the entropy rate vs 
pluripotency score, where values for replicates of each cell type have been 
averaged. Linear regression P-value and R 2 value are given. (C) Normalised 
entropy rates (SR/maxSR, y-axis) between the 159 pluripotent and 32 
differentiated samples from the SCM2. P-value is from a Wilcoxon rank 
sum test. (D) Corresponding ROC curve plus AUC of network entropy 
discriminating pluripotent from differentiated cells. 



all these results were independent of cell proliferation, as we verified 
by removing cell proliferation and cycling genes 32 from the network 
(see SI, figs. S3-S4). Furthermore, passage number and sex did not 
have noticeable effects on the entropy rate as assessed in 107 human 
embyronic stem cell (hESC) lines (SI, fig. S5). Consistent with 
network entropy being a marker of pluripotency we observed that 
induced pluripotent stem cell samples exhibited high entropy values, 
similar to that of hESCs, and significantly higher than that of their 
parental differentiated cells (P < 0.0001, SI, figs. S6-S7). 

Next, we compared the network entropy of hESCs to that of com- 
mitted but multipotent cell types, including neural stem cells (NSCs), 
hematopoietic stem cells (HSCs) and mesenchymal stem cells 
(MSCs). Confirming our hypothesis, all of these stem cell types 
exhibited entropies which were lower than that of hESCs/iPSCs, 
but higher than their differentiated progeny (Fig. 3 A, SI, S8-S9). 
Thus, network entropy can discriminate cells within a lineage 
according to their differentiation status. To test this further, in a 
combined haematological data set 33 , encompassing a number of 
different blood cell types including differentiated types (e.g. mono- 
cytes), and less differentiated ones (e.g. CD34+ HSCs and erythro- 
blasts/megakaryocytes), network entropy recapitulated a differenti- 
ation hierarchy consistent with prior knowledge 34,35 (see SI, fig. S10). 
Importantly, we observed that network entropy was a relatively 
robust measure, being fairly insensitive to the normalisation or plat- 
form used (SI, figs. S8-S11), although in the case of MSCs biological 
variations were evident (SI, figs. S8) 36 ' 37 . 

Network entropy is reduced during differentiation. If network 
entropy is a general measure of the undifferentiated state of cells, it 
ought to exhibit dynamic changes in time course differentiation data. 
To this end, we considered expression data of differentiated retinal 
pigment epithelial cells, which were induced to de- differentiate, 
followed by a period of re- differentiation (SI). Remarkably, network 
entropy increased upon de- differentiation, reaching a maximum, 
with values subsequently dropping upon re- differentiation (Fig. 3B). 
As another example, we considered a time course data set consisting 
of human promyelocytic leukemia progenitor (HL60) cells, differe- 
ntiating into neutrophils 38 . There were two separate time courses, 
using distinct stimuli to induce differentiation of HL60 cells. In 
both cases, network entropy was significantly reduced with time 
(ATRA stimulus, R 2 = 0.96, P < 10" 8 , Fig. 3B). Once again, these 
dynamic changes were independent of cell-proliferation (SI, fig. S 12). 

Network entropy discriminates cancer stem cells, cancer and 
normal cells. Differentiation is a key distinctive feature of cancer 
and normal cells, with cancer representing a less differentiated and 
more heterogeneous state. Confirming this, network entropy was 
consistently higher in cancer tissue compared to normal cells, 
across four different tissue types, with cancer cell-lines exhibiting 
even higher values (Fig. 4A). We further analysed an expression 
data set profiling putative cancer stem cells (CSCs) and their 
parental cancers across a number of different tissues 39 . This showed 
that CSCs exhibited a marginally higher network entropy than their 
non-stem like counterparts, consistent with the view that CSCs retain 
a higher level of plasticity (Fig. 4B). 

Interestingly, comparing the network entropy of hESCs to terato- 
carcinomas and germ cell tumours (all from the SCM and all deemed 
pluripotent), revealed marginally higher values in the hESCs (SI, fig. 
SI 3). This pattern of higher network entropy in normal stem cells 
was also seen in the non pluripotent context: for instance, the net- 
work entropy of HSCs and NSCs was, in general, higher than that of 
leukemic stem and glioma stem cells, respectively (SI, fig. S13-S14). 
Thus, while CSCs and ordinary cancer cells exhibit significantly 
increased cellular heterogeneity compared to normal differentiated 
tissue, CSCs do not appear to exhibit higher values relative to their 
normal stem cell counterparts, and even appear to show reduced 
levels of entropy compared to normal stem cells. 
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Figure 3 | Network entropy marks differentiation potential. (A) Multi-lineage analysis: Left panel: Comparison of normalised network entropy values of 
hESCs, hematopoietic stem cells (HSCs), T & B-cell lymphocytes plus natural killer cells (LYMPH/NKC), and monocytes plus neutrophils (MC/PMN). 
Middle panel: Comparison of normalised network entropy values of hESCs, mesenchyma stem cells (MSCs) and differentiated osteoblasts (OST) and 
chondrocytes (CHO). Right panel: Comparison of normalised network entropy values of hESCs, neural stem cells (NSCs) derived from the hESCs, fetal 
neural stem cells (FNSC) and primary astrocytes (AC), as derived from the SCM compendium (Illumina arrays). Wilcoxon rank sum test P-values 
between consecutive groups in the differentiation hierarchy are given. (B) Dynamic changes in network entropy: Left panel: Network entropy changes in a 
time course de- differentiation and re- differentiation experiment of retinal pigment epithelium (RPE), with cell density indicating the initial plating 
density of RPE cells. Right panels: Network entropy rate (SR/maxSR, y-axis) changes of HL60 leukemic progenitor cells against time from initial stimulus 
with either ATRA or DMSO. The data points on the left indicate the less differentiated HL60 cells, whereas the ones on the far right represent differentiated 
neutrophils. We provide the R 2 values and associated P-values from a linear regression. 



A) 



Liver 



Stomach 



Pancreas 



Colon 



(/) 

X 
03 



on 

C/) 



cd -| 
d 



00 

o 

CD 



o 

CD 



o 
o 

CD 



P=9e-lfc 



n=37 n=38 
I 1 — 



n=19 



CD 

o 



00 

o 

CD 



o 

CD 



o 
o 

CD 



_ P=2e-10 



n=31 
I - 



n=38 
1 — 



CL 



n=18 

1 

CL 



CD 

d 



oo 
o 



o 

CD 



o 
o 



P=0.066 

4- _L 



I — 



n=39 
1 — 



n=28 

1 

CL 



CJ) 

d 



oo 
o 

CD 



O 
CD 



O 
O 
CD 



' P=2e-05 



n=17 
I — 



n=17 
1 — 



n=10 



— I 

CL 



B) 



Breast 



Brain 



Lung 



Oral 



Colon 



0£ 
CO 
X 
03 



CO 



o 



o 



CD 

o 



-I n=2 



CSC 



n=3 

— r - 

PTC 



nS 




n=5 

CSC 



Combined Fisher-test P=0.02 

n=4 n=5 n=3 n=8 n=5 



T 

PTC 



CSC 



T 

PTC 



CSC 



T 

PTC 



n=7 

CSC 




n=4 



PTC 



Figure 4 | Network entropy discriminates cancer stem cells, cancer and normal tissue. (A) Comparison of normalised entropy rates (SR/maxSR) 
between normal and cancer tissue, as well as cancer cell lines, across four different tissue types, as indicated. (B) Comparison of normalised entropy rates 
between putative cancer stem cells (CSC) and their parental tumour cell lines (PTC) for five different tissue types. Combined Fisher t-test P-value is given. 
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Dynamic changes in local network entropy identifies key differen- 
tiation genes and pathways. To demonstrate that the dynamic 
changes in entropy can be related to changes in activation of 
specific pathways, we considered, as a proof of principle, the 
case of Notch- signaling. Notch signaling is inactive yet inducible 
in the pluripotent state, with activation normally associated with 
differentiation 40 45 . Thus, essential components of the Notch 
signaling pathway should exhibit a lower network entropy in the 
non-pluripotent state. Using data from the stem cell matrix 11 , we 
were able to confirm this for 12 of the 13 Notch pathway genes (SI, 
figs. S15-S16). To confirm the statistical significance of this, in none 
of 10000 random selections of 13 genes from the PIN did we observe 
the same level of consistency and statistical significance as for the 
Notch pathway genes (P < 0.0001), indicating that reduced entropy 
of the Notch pathway is a key feature of the non-pluripotent state (SI, 
fig. SI 7). It is also important to demonstrate that the interactors 
driving the lower entropy of Notch genes are other Notch-pathway 
genes. For many Notch genes (e.g. NOTCH 2, NOTCH 3, DLL1, JAG1, 
PSENEN, APH1A, APH1B) this was indeed the case, despite the fact 
that there were also many non-Notch pathway interactors present 
(SI, figs. S16,S18). 

To further test the added value of local network entropy, we revis- 
ited the HL60 to neutrophil time course data. Using linear regres- 
sions we identified the genes showing the most significant decreases 
and increases in network entropy. Ranking genes according to those 
showing the largest reductions in network entropy and performing a 
subsequent Gene Set Enrichment Analysis (GSEA), we identified 
JAK-STAT signalling as one of the key pathways (SI, fig. S19-S20). 
The involvement of this pathway is heavily supported by 
previous studies 46 " 49 . Attesting to the statistical significance of the 
JAK-STAT pathway, computing entropies after randomly permut- 
ing the gene expression profiles over the nodes in the network led to 
no significantly enriched biological terms (adjusted P-values > 0.05). 
This is an important result because it shows that the dynamic net- 
work entropy changes inferred from the integrated PIN are indeed 
targeting specific signaling pathways. Finally, using non-network 
based approaches did not identify the JAK-STAT pathway (SI, 
fig. S19). 

Discussion 

Here we have taken a systems analysis view of cellular differentiation, 
proposing the concept that network entropy is inversely correlated 
with the differentiation status of a sample. By computing the network 
entropy of over 800 samples, encompassing cell types from many 
diverse cell-lineages and differentiation stages, and profiled using a 
variety of different microarray platforms, we have demonstrated that 
entropy provides a near absolute quantification measure of the 
undifferentiated state of any given sample. 

In the context of normal physiology, hESCs and other pluripotent 
cell types were correctly predicted to exhibit the highest levels of 
network entropy, followed by multipotent stem cells (e.g. NSC/ 
HSC/MSC), with terminally differentiated cells exhibiting signifi- 
cantly lower entropy (Fig. 5). In the context of cancer, CSCs exhibited 
higher levels of cellular entropy than ordinary cancer cells, although 
this difference appears substantially reduced in comparison to what 
is observed between normal stem cells and their differentiated pro- 
geny (Fig. 5). Cancer cell lines exhibited a higher entropy than prim- 
ary cancers, with cancer tissue possessing higher values than normal 
tissue (Fig. 5). All these findings are consistent with network entropy 
being a direct measure of the average intrasample cellular heterogen- 
eity, supporting the view that cellular states such as pluripotency are a 
statistical property of a cell population 6,22 . Indeed, although we have 
not analysed genome-wide single-cell expression data, it is highly 
plausible that the degree of cellular heterogeneity is determined by 
the level of signaling promiscuity, and hence stochasticity, in single 
cells 6,22 . The observation that cancer stem cells exhibit a high but 
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Figure 5 | Network entropy rates between major cell types in normal and 
cancer physiology. Network entropy correlates with pluripotency and 
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marginally lower network entropy than their normal stem cell coun- 
terparts is also consistent with the view that CSCs must be charac- 
terised by oncogenic pathway dependencies, which, as shown in a 
previous study, lead to a lowering of network entropy 26 . Local 
entropy analyses aimed at identifying the specific oncogenic path- 
ways driving the lower entropy in CSCs could thus offer novel thera- 
peutic opportunities 26 . 

It is important to stress again that network entropy provides a very 
general system's measure of the undifferentiated state of a sample. In 
this regard, we remark that reported pluripotency expression signa- 
tures 12,15 , which lack a systems-level interpretation and underst- 
anding, could only consistently discriminate pluripotent from 
non-pluripotent cell types, but generally failed to discriminate cell 
types located further down the differentiation hierarchy, irrespective 
of normal or cancer physiology (SI, figs. S21-S27). Thus, the fact that 
network entropy provides a more refined classification of the distinct 
cell types across the global differentiation hierarchy, and that it did so 
independently of cell-proliferation indices, attests to the biological 
importance of this measure and of the statistical mechanical frame- 
work on which it is based. 

Although we observed some variation in entropy rates between 
studies profiling the same cell types using the same technology, it is 
nevertheless also important to note that these variations were in 
general small and that network entropy provided a relatively robust 
measure of the undifferentiated cellular state: for instance, hESCs 
always showed the highest levels of network entropy, irrespective 
of study or platform. This robustness stems from two key features. 
First, network entropy is a self- calibrating measure, as it is con- 
structed by taking ratios of gene expression intensity values. This 
makes it a dimensionless quantity and fairly insensitive to the micro - 
array or normalisation method used, unlike the scores derived from 
pluripotency signatures which showed significant variations between 
studies (see SI, fig. S28). Second, network entropy is not affected by 
overfitting since it is a quantity which does not depend on feature 
selection. Thus, unlike pluripotency expression signatures 12,14,15 , net- 
work entropy does not depend on tunable parameters. It follows 
that network entropy could provide a simple, general and robust 
quantitative test for assessing the pluripotency or multipotency of 
a cellular sample. For instance, it could be used to assess the quality of 
iPSCs in reprogramming experiments or even to identify mislabeled 
samples. 
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Since a sample's network entropy is computed from integration of 
its genome-wide expression profile with a protein interaction net- 
work, it is important to also comment on the robustness of the results 
in relation to the network, and more importantly on the number of 
genes that are measure. Considering the HL60 differentiation time 
course data set as a test case, we observed that randomly subsampling 
from the underlying integrated network and recomputing the 
entropy rates for the resulting maximally connected components, 
still resulted in significant decreases of the entropy rate with differ- 
entiation stage, as long as we subsample at least 40% of genes in the 
network (SI, fig. S29). That the association between network entropy 
and differentiation stage is robust to subsampling indicates that the 
dynamic changes in entropy are driven by a subtle interplay between 
the gene expression changes and the topological properties of the 
nodes exhibiting these changes. We leave investigation of this and 
other aspects to a future study. 

In summary, we have proposed a relatively simple, computable, 
systems property of a genome-wide expression profile, called net- 
work entropy, which provides an estimate of signaling promiscuity 
and cellular heterogeneity, and which correlates with the undiffer- 
entiated state of cells. Network entropy may thus serve as a quant- 
itative in-silico proxy for a sample's differentiation potential in 
Waddington's epigenetic landscape. 

Methods 

Full details of the data sets, interaction network and all statistical methods used are 
provided in SI. Below, we give a brief sketch of how network entropy is calculated. 

Construction of the sample specific stochastic matrix and network entropy rate. 

The sample specific stochastic matrix is estimated by integrating the gene expression 
profile of the sample with a comprehensive protein interaction network. Specifically, 
we invoke the mass action principle: let E t denote the normalised expression level of 
gene i in a given sample. For a given neighbour; <e N(i) (where N(i) labels the 
neighbours of i in the PIN), the mass-action principle states that the probability of 
interaction with i is approximated by the product E { Ey Normalising this to ensure that 
Xjpij = 1, we get 



^keN(i)Eks 



VjGN(i) 



(2) 



Clearly, if j N(i), thenpy = 0. This then defines a sample- specific stochastic matrix. 
From this stochastic matrix one can then construct a local network entropy for each 
gene i in the PIN, as 



jeN(i) 



(3) 



which reflects the level of uncertainty or promiscuity in the local interaction 
probabilities around gene L We note that the above expression for the local entropy is 
not normalised so that the maximum possible entropy depends on the degree (kj) of 
the node i. In fact, max S, = log k { . Thus, it is convenient to also define a normalised 
local entropy as (see 25 ), 



Si = 



log ki 



(4) 



jeN(i) 



We stress again that this local network entropy can be computed for each gene i in 
each given sample. When defining a global network entropy (i.e. for the whole 
network) one can, in principle, consider the average of these normalised local 
entropies. This average however is a nonequilibrium entropy 26 , in contrast to the 
global entropy rate, S R , which is defined in terms of the stationary distribution, n, of 
the stochastic matrix p, i.e. through up = n. Specifically, the global entropy rate, S R , is 
defined by 2930 



Sr=Y1 UiSi 



(5) 



where S t are the unnormalised local entropies. We note that the network entropy rate 
is bounded between 0 and a positive maximum value that depends only on the 
adjacency matrix of the network 50 . Indeed, it can be shown that the maximum 
possible entropy rate is attained by a stochastic matrix, p ijy defined bypy = A^Vj/lVj, 
where Ay is the adjacency matrix (i.e. unweighted) of the PIN, and v and X are the 
dominant eigenvector and eigenvalue of this adjacency matrix, respectively. The 
maximum attainable entropy rate, M Ry will thus depend on the specifics of the 
network, including total number of genes, edges and topology. Thus, to facilitate 
comparison between networks, the network entropy rate, S R , can be scaled relative to 
the maximum attainable value in that given network, S R = S R /M R , so that S R is always 



bounded between 0 and 1. In this work, all reported entropy rates have been 
normalised in this way. 

We note that computation of the entropy rate is computationally intensive as it 
requires estimation of the stationary distribution of a large stochastic matrix. For a 
connected network of size 8290 nodes, computation of a sample's entropy rate takes 
— 10 minutes on a Dell Precision T5400 workstation. R- scripts performing the 
computations are freely available on request. 
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